AITopics | convergence error

Bilevel optimization has arisen as a powerful tool for solving a variety of machine learning problems. Two current popular bilevel optimizers AID-BiO and ITD-BiO naturally involve solving one or two sub-problems, and consequently, whether we solve these problems with loops (that take many iterations) or without loops (that take only a few iterations) can significantly affect the overall computational efficiency. Existing studies in the literature cover only some of those implementation choices, and the complexity bounds available are not refined enough to enable rigorous comparison among different implementations. In this paper, we first establish unified convergence analysis for both AID-BiO and ITD-BiO that are applicable to all implementation choices of loops. We then specialize our results to characterize the computational complexity for all implementations, which enable an explicit comparison among them. Our result indicates that for AID-BiO, the loop for estimating the optimal point of the inner function is beneficial for overall efficiency, although it causes higher complexity for each update step, and the loop for approximating the outer-level Hessian-inverse-vector product reduces the gradient complexity. For ITD-BiO, the two loops always coexist, and our convergence upper and lower bounds show that such loops are necessary to guarantee a vanishing convergence error, whereas the no-loop scheme suffers from an unavoidable non-vanishing convergence error. Our numerical experiments further corroborate our theoretical results.

bilevel optimizer benefit, complexity, name change, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.76)

Add feedback

A Decomposition of Forecast Error in Prediction Markets

Miro Dudik, Sebastien Lahaie, Ryan M. Rogers, Jennifer Wortman Vaughan

Neural Information Processing SystemsNov-21-2025, 11:52:33 GMT

We analyze sources of error in prediction market forecasts in order to bound the difference between a security's price and the ground truth it estimates.

artificial intelligence, cost function, modeling & simulation, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > Denmark (0.04)

Industry: Banking & Finance > Trading (1.00)

Technology:

Information Technology > Artificial Intelligence (0.93)
Information Technology > Modeling & Simulation (0.64)

Add feedback

Many thanks to the reviewers for their deep, thoughtful reviews and constructive suggestions

Neural Information Processing SystemsAug-20-2025, 01:13:15 GMT

We note that despite very recent observations on empirical superiority of adaptive synchronization (e.g., Surely, it would be interesting to see if our bound can be tightened. R1. log T communication rounds clarification: However, for local SGD with periodic averaging the proof techniques are more involved. We do not tune the learning rate.

communication, experiment, thoughtful review and constructive suggestion, (12 more...)

Neural Information Processing Systems

Genre: Research Report (0.32)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.69)

Add feedback

79ba1b827d3fc58e129d1cbfc8ff69f2-Paper-Conference.pdf

Neural Information Processing SystemsAug-16-2025, 04:01:12 GMT

artificial intelligence, machine learning, participation, (15 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Variance-Reduced Off-Policy TDC Learning: Non-Asymptotic Convergence Analysis

Neural Information Processing SystemsAug-15-2025, 16:32:31 GMT

Recently, several work proposed to apply the variance reduction technique developed in the stochastic optimization literature to reduce the variance of TD learning.

algorithm, convergence error, vrtdc, (11 more...)

Neural Information Processing Systems

Country:

North America > Canada > Alberta (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
North America > United States > New York > Erie County > Buffalo (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)

Add feedback

Will Bilevel Optimizers Benefit from Loops

Neural Information Processing SystemsOct-9-2024, 20:59:16 GMT

Bilevel optimization has arisen as a powerful tool for solving a variety of machine learning problems. Two current popular bilevel optimizers AID-BiO and ITD-BiO naturally involve solving one or two sub-problems, and consequently, whether we solve these problems with loops (that take many iterations) or without loops (that take only a few iterations) can significantly affect the overall computational efficiency. Existing studies in the literature cover only some of those implementation choices, and the complexity bounds available are not refined enough to enable rigorous comparison among different implementations. In this paper, we first establish unified convergence analysis for both AID-BiO and ITD-BiO that are applicable to all implementation choices of loops. We then specialize our results to characterize the computational complexity for all implementations, which enable an explicit comparison among them.

bilevel optimizer benefit, complexity, implementation choice, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.79)

Add feedback

A Decomposition of Forecast Error in Prediction Markets

Miro Dudik, Sebastien Lahaie, Ryan M. Rogers, Jennifer Wortman Vaughan

Neural Information Processing SystemsOct-3-2024, 23:05:33 GMT

Neural Information Processing Systems http://nips.cc/

cost function, equilibrium, trader, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > Denmark (0.04)

Industry: Banking & Finance > Trading (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.34)

Add feedback

Improved Quantization Strategies for Managing Heavy-tailed Gradients in Distributed Learning

Yan, Guangfeng, Li, Tan, Xiao, Yuanzhang, Hou, Hanxu, Song, Linqi

arXiv.org Artificial IntelligenceFeb-2-2024

Gradient compression has surfaced as a key technique to address the challenge of communication efficiency in distributed learning. In distributed deep learning, however, it is observed that gradient distributions are heavy-tailed, with outliers significantly influencing the design of compression strategies. Existing parameter quantization methods experience performance degradation when this heavy-tailed feature is ignored. In this paper, we introduce a novel compression scheme specifically engineered for heavy-tailed gradients, which effectively combines gradient truncation with quantization. This scheme is adeptly implemented within a communication-limited distributed Stochastic Gradient Descent (SGD) framework. We consider a general family of heavy-tail gradients that follow a power-law distribution, we aim to minimize the error resulting from quantization, thereby determining optimal values for two critical parameters: the truncation threshold and the quantization density. We provide a theoretical analysis on the convergence error bound under both uniform and non-uniform quantization scenarios. Comparative experiments with other benchmarks demonstrate the effectiveness of our proposed method in managing the heavy-tailed gradients in a distributed learning environment.

convergence error, gradient, quantization, (13 more...)

arXiv.org Artificial Intelligence

2402.01798

Country:

Asia > China > Hong Kong (0.05)
North America > United States > Hawaii (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.50)

Industry: Education (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Filters

Collaborating Authors

convergence error

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

c17028c9b6e0c5deaad29665d582284a-AuthorFeedback.pdf

a992995ef4f0439b258f2360dbb85511-Paper.pdf

Will Bilevel Optimizers Benefit from Loops

A Decomposition of Forecast Error in Prediction Markets

Many thanks to the reviewers for their deep, thoughtful reviews and constructive suggestions

79ba1b827d3fc58e129d1cbfc8ff69f2-Paper-Conference.pdf

Variance-Reduced Off-Policy TDC Learning: Non-Asymptotic Convergence Analysis

Will Bilevel Optimizers Benefit from Loops

A Decomposition of Forecast Error in Prediction Markets

Improved Quantization Strategies for Managing Heavy-tailed Gradients in Distributed Learning